智能论文笔记

On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective

Ying Wen , Ziyu Wan , Ming Zhou , Shufang Hou , Zhe Cao , Chenyang Le , Jingxiao Chen , Zheng Tian , Weinan Zhang , Jun Wang

分类：人工智能 | 机器学习

2022-12-24

Our situated environment is full of uncertainty and highly dynamic, thus hindering the widespread adoption of machine-led Intelligent Decision-Making (IDM) in real world scenarios. This means IDM should have the capability of continuously learning new skills and efficiently generalizing across wider applications. IDM benefits from any new approaches and theoretical breakthroughs that exhibit Artificial General Intelligence (AGI) breaking the barriers between tasks and applications. Recent research has well-examined neural architecture, Transformer, as a backbone foundation model and its generalization to various tasks, including computer vision, natural language processing, and reinforcement learning. We therefore argue that a foundation decision model (FDM) can be established by formulating various decision-making tasks as a sequence decoding task using the Transformer architecture; this would be a promising solution to advance the applications of IDM in more complex real world tasks. In this paper, we elaborate on how a foundation decision model improves the efficiency and generalization of IDM. We also discuss potential applications of a FDM in multi-agent game AI, production scheduling, and robotics tasks. Finally, through a case study, we demonstrate our realization of the FDM, DigitalBrain (DB1) with 1.2 billion parameters, which achieves human-level performance over 453 tasks, including text generation, images caption, video games playing, robotic control, and traveling salesman problems. As a foundation decision model, DB1 would be a baby step towards more autonomous and efficient real world IDM applications.

translated by 谷歌翻译

CTT-Net: A Multi-view Cross-token Transformer for Cataract Postoperative Visual Acuity Prediction

Jinhong Wang , Jingwen Wang , Tingting Chen , Wenhao Zheng , Zhe Xu , Xingdi Wu , Wen Xu , Haochao Ying , Danny Chen , Jian Wu

分类：计算机视觉

2022-12-12

Surgery is the only viable treatment for cataract patients with visual acuity (VA) impairment. Clinically, to assess the necessity of cataract surgery, accurately predicting postoperative VA before surgery by analyzing multi-view optical coherence tomography (OCT) images is crucially needed. Unfortunately, due to complicated fundus conditions, determining postoperative VA remains difficult for medical experts. Deep learning methods for this problem were developed in recent years. Although effective, these methods still face several issues, such as not efficiently exploring potential relations between multi-view OCT images, neglecting the key role of clinical prior knowledge (e.g., preoperative VA value), and using only regression-based metrics which are lacking reference. In this paper, we propose a novel Cross-token Transformer Network (CTT-Net) for postoperative VA prediction by analyzing both the multi-view OCT images and preoperative VA. To effectively fuse multi-view features of OCT images, we develop cross-token attention that could restrict redundant/unnecessary attention flow. Further, we utilize the preoperative VA value to provide more information for postoperative VA prediction and facilitate fusion between views. Moreover, we design an auxiliary classification loss to improve model performance and assess VA recovery more sufficiently, avoiding the limitation by only using the regression metrics. To evaluate CTT-Net, we build a multi-view OCT image dataset collected from our collaborative hospital. A set of extensive experiments validate the effectiveness of our model compared to existing methods in various metrics. Code is available at: https://github.com/wjh892521292/Cataract OCT.

translated by 谷歌翻译

Monitoring Vegetation From Space at Extremely Fine Resolutions via Coarsely-Supervised Smooth U-Net

Joshua Fan , Di Chen , Jiaming Wen , Ying Sun , Carla P. Gomes

分类：计算机视觉 | 人工智能

2022-07-16

在极端分辨率上监测植被生产力对于现实世界中的农业应用非常有价值，例如检测作物压力和提供粮食不安全的预警。太阳能诱导的叶绿素荧光（SIF）提供了一种直接从空间中测量植物生产力的有希望的方法。但是，卫星SIF观察只能以粗空间分辨率进行，因此无法监视单个农作物类型或农场的表现。这构成了一个具有挑战性的粗略监督回归（或缩小）任务；在训练时，我们只有粗分辨率（3公里）的SIF标签，但我们希望以更精细的空间分辨率预测SIF（例如30m，增加了100倍）。我们还具有其他精细分辨率输入功能，但是这些功能与SIF之间的关系尚不清楚。为了解决这个问题，我们提出了一种粗糙的平滑U-NET（CS-Sunet），这是这种粗糙监督设置的新方法。 CS-Sunet基于先验知识（例如平滑度损失），将深卷卷网络的表达能力与新颖的正则化方法相结合，这对于防止过度拟合至关重要。实验表明，CS-Sunet比现有方法更准确地解决SIF中的细粒变化。

translated by 谷歌翻译

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

Muning Wen , Jakub Grudzien Kuba , Runji Lin , Weinan Zhang , Ying Wen , Jun Wang , Yaodong Yang

分类：机器学习

2022-05-30

GPT系列和BERT等大型序列模型（SM）在视觉，语言以及最近的强化学习任务上表现出了出色的性能和概括功能。一个自然的后续问题是如何将多代理决策抽象成SM问题，并受益于SMS的繁荣发展。在本文中，我们介绍了一种名为多代理变压器（MAT）的新型架构，该结构有效地将合作的多代理增强学习（MARL）施加到SM问题中，其中任务是将代理的观察顺序映射到代理的最佳动作序列中。我们的目标是在Marl和SMS之间建造桥梁，以便为MARL释放现代序列模型的建模能力。我们垫子的核心是一个编码器架构，它利用多代理优势分解定理将联合策略搜索问题转换为顺序决策过程。这仅适用于多代理问题的线性时间复杂性，最重要的是，具有单调性能改进保证。与以前的艺术（例如Decorment Transformer Fit仅预先收集的离线数据）不同，MAT通过在线试验和环境中的错误进行培训。为了验证MAT，我们对StarcraftII，多代理Mujoco，灵巧的手操纵和Google Research Football Benchmarks进行了广泛的实验。结果表明，与Mappo和Happo在内的强大基线相比，MAT可实现卓越的性能和数据效率。此外，我们证明MAT是一位出色的少数人，无论代理人的数量变化如何，MAT都是看不见的任务。请参阅我们的项目页面，网址为https://sites.google.com/view/multi-agent-transformer。

translated by 谷歌翻译

Automated assessment of disease severity of COVID-19 using artificial intelligence with synthetic chest CT

Mengqiu Liu , Ying Liu , Yidong Yang , Aiping Liu , Shana Li , Changbing Qu , Xiaohui Qiu , Yang Li , Weifu Lv , Peng Zhang

分类：计算机视觉

2021-12-11

背景：患者的分类是控制2019年冠状病毒疾病的大流行病（Covid-19），特别是在临床资源极为有限时在大流行的峰值期间。目的：开发一种用合成胸CT自动筛分和量化肺和肺炎病变的方法，并评估Covid-19患者的疾病严重程度。材料和方法：在本研究中，我们通过可用的数据集（来自“肺结核分析2016年”的285个数据集“来生成数据增强以产生合成胸CT图像。合成图像和掩模用于训练2D U-Net神经网络并在203个Covid-19数据集上测试，以产生肺和病变分段。疾病严重程度评分（DL：损伤负荷; DS：损伤得分）是基于分段计算的。使用Pearson方法评估DL / DS和临床实验室测试之间的相关性。 p值<0.05被认为是统计显着性。结果：将自动肺和病变分段与手动注释进行比较。对于肺部分割，骰子相似系数，Jaccard指数和平均表面距离的中值分别为98.56％，97.15％和0.49 mm。病变分割的相同度量分别为76.95％，62.54％和2.36毫米。在DL / DS和百分比淋巴细胞检测中发现显着（P << 0.05）相关性，R值分别为-0.561和-0.501。结论：基于胸部射线照相和数据增强的AI系统对Covid-19患者的肺癌和病变进行了分段。成像结果与临床实验室测试之间的相关性表明该系统的价值作为评估Covid-19疾病严重程度的潜在工具。

translated by 谷歌翻译

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Conquers All StarCraftII Tasks

Linghui Meng , Muning Wen , Yaodong Yang , Chenyang Le , Xiyun Li , Weinan Zhang , Ying Wen , Haifeng Zhang , Jun Wang , Bo Xu

分类：机器学习 | 人工智能

2021-12-06

离线强化学习利用静态数据集来学习最佳策略，无需访问环境。由于代理商在线交互的展示和培训期间的样本数量，这种技术对于多代理学习任务是可取的。然而，在多代理强化学习（Marl）中，从未研究过在线微调的离线预训练的范式从未研究过，可以使用离线MARL研究的数据集或基准。在本文中，我们试图回答违规在Marl中的离线培训是否能够学习一般的政策表现，这些问题可以帮助提高多个下游任务的性能。我们首先引入基于Starcraftia环境的不同质量水平的第一个离线Marl数据集，然后提出了用于有效的离线学习的多代理决策变压器（MADT）的新颖体系结构。 MADT利用变换器的时间表示的建模能力，并将其与离线和在线MARL任务集成。 Madt的一个至关重要的好处是，它学会了可以在不同任务场景下不同类型的代理之间转移的可稳定性政策。当在脱机目的Datline数据上进行评估时，Madt展示了比最先进的离线RL基线的性能卓越。当应用于在线任务时，预先训练的MADT显着提高了样品效率，即使在零射击案件中也享有强大的性能。为了我们的最佳知识，这是第一个研究并展示了在Marl中的样本效率和最常性增强方面的离线预训练模型的有效性。

translated by 谷歌翻译

Unsupervised COVID-19 Lesion Segmentation in CT Using Cycle Consistent Generative Adversarial Network

Chengyijue Fang , Yingao Liu , Mengqiu Liu , Xiaohui Qiu , Ying Liu , Yang Li , Jie Wen , Yidong Yang

分类：计算机视觉

2021-11-23

Covid-19已成为全球大流行，仍然对公众产生严重的健康风险。 CT扫描中肺炎病变的准确和有效的细分对于治疗决策至关重要。我们提出了一种使用循环一致生成的对冲网络（循环GaN）的新型无监督方法，其自动化和加速病变描绘过程。工作流程包括肺体积分割，“合成”健康肺一代，感染和健康的图像减法，以及二元病变面膜创造。首先使用预先训练的U-Net划定肺体积，并作为后续网络的输入。开发了循环GaN，以产生来自受感染的肺图像的合成的“健康”肺CT图像。之后，通过从“受感染的”肺CT图像中减去合成的“健康”肺CT图像来提取肺炎病变。然后将中值过滤器和K-Means聚类应用于轮廓的病变。在两个公共数据集（冠状遗传酶和Radiopedia）上验证了自动分割方法。骰子系数分别达到0.748和0.730，用于冠状遗传酶和RadioPedia数据集。同时，对冠纳卡酶数据集的病变分割性的精度和灵敏度为0.813和0.735，以及用于Radiopedia数据集的0.773和0.726。性能与现有的监督分割网络和以前无监督的特性相当。提出的无监督分割方法在自动Covid-19病变描绘中实现了高精度和效率。分割结果可以作为进一步手动修改的基线和病变诊断的质量保证工具。此外，由于其无人自化的性质，结果不受医师经验的影响，否则对监督方法至关重要。

translated by 谷歌翻译

Neural Auto-Curricula

Xidong Feng , Oliver Slumbers , Ziyu Wan , Bo Liu , Stephen McAleer , Ying Wen , Jun Wang , Yaodong Yang

分类：人工智能

2021-06-04

在解决双球员零和游戏时，多代理强化学习（MARL）算法通常会在每次迭代时创造代理人群，在每次迭代时，将被发现为对对手人口对混合的最佳响应。在这样的过程中，“遵循”（即对手混合物）和“如何击败它们”（即寻找最佳响应）的更新规则是由手动开发的游戏理论原则基础，如虚构的游戏和双倍甲骨文。在本文中，我们介绍了一种新颖的框架 - 神经自动课程（NAC） - 利用元梯度下降来自动化学习更新规则的发现，而无明确的人类设计。具体而言，我们通过优化子程序参数通过神经网络和最佳响应模块参数化对手选择模块，并通过与游戏引擎的交互仅更新其参数，其中播放器旨在最大限度地减少其利用性。令人惊讶的是，即使没有人类的设计，发现的Marl算法也可以通过基于最先进的人口的游戏，在技能游戏，可微分的乐透，不转化的混合物游戏中实现竞争或更好的性能，实现竞争或更好的性能。迭代匹配的便士和kuhn扑克。此外，我们表明NAC能够从小型游戏到大型游戏，例如Kuhn Poker培训，在LEDUC扑克上表现优于PSRO。我们的工作激发了一个未来的未来方向，以完全从数据发现一般的Marl算法。

translated by 谷歌翻译

Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges

Jian Ding , Nan Xue , Gui-Song Xia , Xiang Bai , Wen Yang , Micheal Ying Yang , Serge Belongie , Jiebo Luo , Mihai Datcu , Marcello Pelillo

分类：计算机视觉

2021-02-24

在过去的十年中，由于航空图像引起的物体的规模和取向的巨大变化，对象检测已经实现了自然图像中的显着进展，而不是在空中图像中。更重要的是，缺乏大规模基准已成为在航拍图像（ODAI）中对物体检测发展的主要障碍。在本文中，我们在航空图像（DotA）中的物体检测和用于ODAI的综合基线的大规模数据集。所提出的DOTA数据集包含1,793,658个对象实例，18个类别的面向边界盒注释从11,268个航拍图像中收集。基于该大规模和注释的数据集，我们构建了具有超过70个配置的10个最先进算法的基线，其中已经评估了每个模型的速度和精度性能。此外，我们为ODAI提供了一个代码库，并建立一个评估不同算法的网站。以前在Dota上运行的挑战吸引了全球1300多队。我们认为，扩大的大型DOTA数据集，广泛的基线，代码库和挑战可以促进鲁棒算法的设计和对空中图像对象检测问题的可再现研究。

translated by 谷歌翻译

Analogical Inference Enhanced Knowledge Graph Embedding

Yao Zhen , Zhang Wen , Chen Mingyang , Huang Yufeng , Yang Yi , Chen Huajun

分类：人工智能 | 自然语言处理

2023-01-03

Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.

translated by 谷歌翻译